Improving the Computational Morphological Analysis of a Swahili Corpus for Lexicographic Purposes
نویسندگان
چکیده
Computational morphological analysis is an important first step in the automatic treatment of natural language and a useful lexicographic tool. This article describes a corpus-based approach to the morphological analysis of Swahili. We particularly focus our discussion on its ability to retrieve lemmas for word forms and evaluate it as a tool for corpus-based dictionary
منابع مشابه
A heuristic for morpheme discovery based on string edit distance
This paper derives from work we have been doing on unsupervised learning of the morphology of languages with rich morphologies, that is, with a high average number of morphemes per word. Our focus in this paper is Swahili, a major Bantu language of East Africa, and our goal is the development of a system that can automatically produce a morphological analyzer of a text on the basis of a large c...
متن کاملUsing Syllables as Features in Morpheme Tagging in Swahili
Utilizing corpora to build morphological analyzers for the purposes of computational application has been addressed in many different ways. Methods for automated morphological analysis generally focus on segmentation from raw text, and ignore the actual learning of what morpheme features are present. Other methods are time-consuming and require a great deal of prior knowledge of the language su...
متن کاملHedges in English for Academic Purposes: A Corpus-based study of Iranian EFL learners
Hedges, as tools to express tentativeness and doubt, have been studied in plenty of research papers in the Iranian EFL research setting. However, their use in a learner corpus, portraying Iranian learner English, is in need of more research attention. With this end in view, this study aimed at investigating how Iranian EFL learners who have majored in English-related fields in Iran deployed hed...
متن کاملLanguage Resources Development at the Spanish Royal Academy
$EVWUDFW This paper explains some of the most relevant issues concerning the development of language resources at the Spanish Royal Academy. Two 125-M words corpus of Spanish language (synchronic and diachronic) and three specialized corpus has been developed. Around the corpus, RAE is also developing NLP tools and resources to morpho-syntactically annotate them. Some of the most relevant are: ...
متن کاملDisambiguation of morphological analysis in Bantu languages
The paper describes problems in disambiguating the morphological analysis of Bantu languages by using Swahili as a test language. The main factors of ambiguity in this language group can be traced to the noun class structure on one hand and to the bi-directional word-formation on the other. In analyzing word-forms, the system applied utilizes SWATWOL, a morphological parsing program based on tw...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008